Search results for "Cycles per instruction"
showing 5 items of 5 documents
Multi-objective optimisations for a superscalar architecture with selective value prediction
2012
This work extends an earlier manual design space ex ploration of our developed Selective Load Value Pre diction based superscalar architecture to the L2 unified cache. A fter that we perform an automatic design space expl oration using a special developed software tool by varying several architectural parameters. Our goal is to find optim al configurations in terms of CPI (Cycles per Instruction) and energy consumption. By varying 19 architectural parameter s, as we proposed, the design space is over 2.5 millions of billions configurations which obviously means that only heuristic search can be considered. Therefore, we propose dif ferent methods of automatic design space exploratio n based…
Versatile Direct and Transpose Matrix Multiplication with Chained Operations: An Optimized Architecture Using Circulant Matrices
2016
With growing demands in real-time control, classification or prediction, algorithms become more complex while low power and small size devices are required. Matrix multiplication (direct or transpose) is common for such computation algorithms. In numerous algorithms, it is also required to perform matrix multiplication repeatedly, where the result of a multiplication is further multiplied again. This work describes a versatile computation procedure and architecture: one of the matrices is stored in internal memory in its circulant form, then, a sequence of direct or transpose multiplications can be performed without timing penalty. The architecture proposes a RAM-ALU block for each matrix c…
A 16 channel high resolution (<11 ps RMS) Time-to-Digital Converter in a Field Programmable Gate Array
2012
A 16-channel Time-to-Digital Converter (TDC) was implemented in a general purpose Field-Programmable Gate Array (FPGA). The fine time calculations are achieved by using the dedicated carry-chain lines. The coarse counter defines the coarse time stamp. In order to overcome the negative effects of temperature and power supply dependency bin-by-bin calibration is applied. The time interval measurements are done using 2 channels. The time resolution of channels are calculated for 1 clock cycle and a minimum of 10.3 ps RMS on two channels, yielding 7.3 ps RMS (10.3 ps/√2) on a single channel is achieved.
Improving Computing Systems Automatic Multiobjective Optimization Through Meta-Optimization
2016
This paper presents the extension of framework for automatic design space exploration (FADSE) tool using a meta-optimization approach, which is used to improve the performance of design space exploration algorithms, by driving two different multiobjective meta-heuristics concurrently. More precisely, we selected two genetic multiobjective algorithms: 1) non-dominated sorting genetic algorithm-II and 2) strength Pareto evolutionary algorithm 2, that work together in order to improve both the solutions’ quality and the convergence speed. With the proposed improvements, we ran FADSE in order to optimize the hardware parameters’ values of the grid ALU processor (GAP) micro-architecture from a b…
Synthesizing on a reconfigurable chip an autonomous robot image processing system
2003
This paper deals with the implementation, in a high density reconfigurable device, of an entire log-polar image processing system. The log-polar vision reduces the amount of data to be stored and processed, simplifying several vision algorithms and making it possible the implementation of a complete processing system on a single chip. This image processing system is specially appropriated for autonomous robotic navigation, since these platforms have typically power consumption, size and weight restrictions. Furthermore, the image processing algorithms involved are time consuming and many times they have also real-time restrictions. A reconfigurable approach on a single chip combines hardwar…